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Abstract 

Over the last decade, there have been several data structures that, given a planar subdivision 
and a probability distribution over the plane, provide a way for answering point location queries 
that is fine-tuned for the distribution. All these methods suffer from the requirement that the 
query distribution must be known in advance. 

We present a new data structure for point location queries in planar triangulations. Our 
structure is asymptotically as fast as the optimal structures, but it requires no prior information 
about the queries. This is a 2-D analogue of the jump from Knuth's optimum binary search 
trees (discovered in 1971) to the splay trees of Sleator and Tarjan in 1985. While the former 
need to know the query distribution, the latter are statically optimal. This means that we can 
adapt to the query sequence and achieve the same asymptotic performance as an optimum static 
structure, without needing any additional information. 

1 Introduction 

We consider the problem of finding a statically optimal data structure for planar point location in 
triangulations. This problem and related problems have a long history that goes back to the dawn 
of computer science. Thus, before giving a formal description of the problem and of our results, let 
us first provide some background on the history and motivation behind our work. 

1.1 1-D History 

Comparison-based predecessor search constitutes one of the oldest problems in computer science: 
given a set S from a totally ordered universe U, we would like to construct a data structure for 
answering predecessor queries. In such a query, we are given an element x G U, and we need to 
return the largest y G S with y < x (or — oo, if no such y exists). In the most general decision-tree 
model, we are allowed to evaluate in each step an arbitrary function f : U ^ {0, 1} on x, where the 
choice of / may depend on the outcomes of the previous evaluations. The classic solution sorts S 
during preprocessing and answers queries in O(logn) steps through binary search, where n denotes 
the size of S. Information theoretic arguments imply that any such comparison-based algorithm 
requires il(logn) steps in the worst case (see, e.g., Ailon et al. [2, Section 2] for more details). 

However, the story does not end here. Early in the history of computer science, researchers 
realized that if the distribution of query outcomes is sufficiently biased, o(log n) expected-time query 
processing becomes possible. This insight led to the invention of optimal search trees. These are 
specialized data structures for the case that the query outcomes are drawn independently from a 
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known fixed distribution, and a wide Hterature studying their variants and extensions have been 
developed flO, 22 -24, 28, 33, 35 -37, 41 -44 51,52 . In this context, optimality is characterized by the 
entropy of the distribution: if pi denotes the probabihty of the ith outcome, the entropy Ti is defined 
as —pi log2 Pi- Information theory j48] shows that is a lower bound for the expected number 
of steps that any comparison-based algorithm needs to answer a predecessor query, assuming that 
the searches are drawn independently from a fixed distribution (e.g., [2*, Claim 2.2]). 

All the above results require that the distribution, or a suitable approximation thereof, be known 
in advance. This situation changed in 1985, when Sleator and Tarjan |49| introduced splay trees. 
These trees have many amazing properties, not the least of which is called static optimality. This 
means that for any sufficiently long query sequence, splay trees are asymptotically as fast as optimal 
static search trees. For this, splay trees require no prior information on the query distribution. 



1.2 2-D History 

Planar point location is a fundamental problem in computational geometry. A triangulation S is 
a partition of the plane into (possibly infinite) triangles. Given S, we need to construct a data 
structure for point location queries: given a point p £ M^, return the triangle of S that contains it. 
Again, we use a decision-tree model. This means that in each step we may evaluate an arbitrary 
function / : ^ {0, 1} on p, where / may depend on the previous comparisons. 

There are several point location structures with O(logn) query time, which is optimal in our 
decision-tree model. These structures are notable not only for achieving optimality, but for doing so 



through very different methods, such as planar separators 39 40 1, Kirkpatrick's successive refinement 
approach [3^, persistence |46|, layered DAGs [2^, or randomized incremental construction [45,47 . 

Once again, it makes sense to consider biased query distributions. For a known fixed distribution 
of point location queries, there are several data structures that achieve optimal expected query time, 
assuming independence. These biased structures are analogous to optimal search trees. Thus, we 
can use the same information theoretic arguments to characterize the optimal expected query time 
by the entropy H of the probabilities of the queried regions (2[ Claim 2.2]. 

A series of papers by Arya et al. (3]-[8] converge on two algorithms. The first one achieves query 
time H + 0{VT-L + 1) with 0(n) space, while the second, simpler, algorithm supports queries in time 
{5ln2)'H + 0(1) and O(nlogn) spacej^ The latter algorithm is a truly simple variant of randomized 
incremental construction [45|[47| , where the random choices are biased according to the distribution. 
Both structures are randomized and have superlinear construction costs. lacono |31[ presented 
a data structure that supports 0{7i) time queries in 0{n) space, but, unlike the aforementioned 
results, it is deterministic, can be constructed in linear time, and has terrible constants. 



1.3 Creating a point location structure that is statically optimal 

In view of the developments for binary search trees, one question presents itself: Is there a point 
location structure that is asymptotically as fast as the biased structures, without explicit knowledge 
of the query distribution? Or, put differently, can a point location structure achieve a running time 
similar to the static optimality bound of splay trees? This open problem, which we resolve here, 
explicitly appears in several previous works on point location, e.g., in Arya et al. [8| Section 6]: 

^In this context, query time refers to the expected depth of the associated decision tree. 
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Taking this in a different direction, suppose that the query distribution is not known 
at ah. That is, the probabihties that the query point hes within the various cells of 
the subdivision are unknown. In the 1-dimensionaI case it is known that there exist 
self-adjusting data structures, such as splay trees, that achieve good expected query 
time in the limit. Do such self-adjusting structures exist for planar point location? 

There are several possible approaches towards statically optimal point location. One, suggested 
above, would be to create some sort of self-adjusting point location structure and to analyze it in a 
way similar to splay trees. This has not been done; we suspect that the main stumbling block is 



that all known efficient structures are comparison DAGs 20,34 45-471: they can be represented as 



a directed acyclic graph with a unique source and out-degree 2, such that each node corresponds 
to a planar region. A point location query proceeds by starting at the source and by following in 
each step an edge that is determined by comparing the query point with a fixed line. The query 
continues until it reaches a sink, whose corresponding region constitutes the desired query outcome. 
In order to achieve reasonable space usage, it seems essential to use a DAG instead of a simple tree. 
Unfortunately, we do not know how to perform rotation-like local changes in such DAGs that would 
mimic the behavior of splay trees. 

Another possible avenue is to use splay trees in an existing structure. Goodrich et al. |26| 
followed this approach, using essentially a hybrid of splay trees and the persistent line-sweep method. 
Unfortunately, their method does not give a result optimal with respect to the entropy of the original 
distribution of query outcomes, but rather to the entropy of the probabilities of querying regions of 
a strip decomposition of the triangulation. The latter is obtained by drawing vertical lines through 
every point of the triangulation. This strip decomposition could split a high-probability triangle 
into several parts and could potentially increase the entropy of the query result by Q(logn), the 
worst possible; see Figure [T] for an example. 



■ n + 1 vertices ■ 




■ n + 1 vertices - 





Figure 1: A bad example for the strip decomposition of [26| : (a) We have n + 3 vertices and n + 1 
triangles. The small triangles each have query probability the large, shaded, triangle has 

query probability 1 — 1/n. The entropy is (l/n)logn^ + (1 ~ 1/n) log(n/(n — 1)) = 0(1). (b) The 
strip decomposition partitions the large triangle into n + 1 parts. Suppose each part has probability 
(n — l)/n(n+l) ~ 1/n. The resulting entropy is larger than (1 — 1/n) log(n(n + l)/(n—l)) = i7(logn). 

One might also try to create a structure with the working set property. This property, originally 
used in the analysis of splay trees, states that the processing time for a query q is logarithmic 
in the number of distinct queries since the last query that returned the same result as q. The 
working set property implies static optimality [30 1; it has also proved useful in several other 

Most importantly, there is a general transformation from a dynamic 



contexts 



15,16,21,29,32 
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O(logn) time structure into one with the working set property [30] . Unfortunately, even though 
several dynamic data structures for predecessor searching are known (e.g., AVL trees [T] or red- black 
trees ) , it remains a prominent open problem to develop a point location structure that supports 
insertions, deletions, and queries in O(logn) time. (Note that |50j claims to modify Kirkpatrick's 
method to allow for O(logn) time insertions, deletions, and queries. The claimed result is wrong. 

Our solution to the problem of statically optimal point location is very simple: we take a 
biased structure that needs to be initialized with distributional information, and we rebuild it 
periodically using the observed frequencies for each region. We do not store all the regions in the 
biased structure — this would make the rebuilding step too expensive. Instead, upon rebuilding we 
create a structure storing only the most frequent items observed so far, where /3 E (0, 1) is some 
suitable constant. We resort to a static O(logn) time structure to complete queries for the remaining 
regions. The rebuilding takes place after every queries, for some constant a G (/3, 1 — /?). This is 
a simple and general method of converting biased structures into statically optimal ones, and it 
enables us to waive the requirement of distributional knowledge present in all previous biased point 
location structures, at least for triangulations. Our approach can be seen as a generalization and 
simplification of a method by Goodrich for dictionaries [25j . 



2 Notation 



Let U be some universal set, and let 5" be a partition of U into n pieces. The elements of U are 
called points, the subsets in S are called regions. A location query takes some point p G U and 
returns the region s £ S with p G s. The result of a location query with input p is denoted by q{p). 
A data structure for location queries is called a location query structure. 

Let P = {pi,p2, ■ . . ,Pm) be a sequence of m queries, and let Q := {qipi), q{p2), ■■■■,q{Pm)) 
denote the results of these queries. Let ft{s) be the number of occurrences of s in the first t elements 
of Q, and define f{s) := fm{s), the number of times s occurs in the entire sequence. Furthermore, 
let tj{s) be the time of the j*^ occurrence of s in Q; thus ftj{s){s) = j- 

We use logx to refer to max(l, log2 x); this avoids clutter generated by additive terms that 
would otherwise be needed to handle degenerate cases of our analysis. We next define the notion of 
a biased structure. 

Definition 2 Let S be set of n regions, and let D he a location query structure for S. We say that 
D is biased if the following holds: There exists a function cd : N — )• N such that given any weight 
function w : S ^ M"*", D executes any query sequence P in total time 



O (cD(n) + ^/(s)log 



w{s) j 

The function co is called the construction cost of the structure. 

Suppose we choose w{s) proportional to the number of queries that return the given region, e.g., 
w{s) := /(s) + 1. In this case, a biased location query structure achieves an amortized query time 
that is (of the order of) the entropy Ti of the query distribution. As we argued in the introduction, 
this is optimal for our decision-tree model. We now define the notion of static optimality. 



^The method presented makes the assumption that given a triangle T in a triangulation of size n on which a 
Kirpatrick hierarchy has been built, the complexity of the intersection of T with any level of the hierarchy is constant; 
this is false as examples where the intersection is size ^/n are easy to produce. 
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Definition 3 Let S be set of n regions, and let D be a location query structure for S. We say 
that D is statically optimal if there exists a function : N — t- N such that D executes any query 
sequence P of length m in total tim^ 



o(^cz,(n) + g/(s)log-^j 



We call CD the construction cost of D. 

Note that a statically optimal structure is given neither the frequency function / nor any weights 
in advance, in particular, the structure does not need to be static. 

We provide a simple method for making a biased location query structure statically optimal, 
assuming a few technical conditions. The main such condition is that we should be able to construct 
the biased query structure not just on the set S, but on any subset S' of S. We require that a 
location query structure for S' performs as quickly as a biased structure for S when a region in S' 
is queried, and that it reports failure in O(logn) time if the query lies outside of S' . Formally: 

Definition 4 Let S be set of n regions, and let D be a location query structure. We call D subset- 
biased on S if the following holds: there exists a function : N — )• N such that given a subset 
S' Q S of size n' and a weight function w' : S' ^ M^, the structure D executes any query sequence 
P of length m in time 

\ s'es' w [s ) y J j 

For each query p € P, we require that D reports the region s' £ S' with p £ s' , if it exists, and that 
D reports a failure otherwise. The function c'^ is called the construction cost of the structure. 

Note that m — Yls'eS' fi^') J^^* the number of queries that result in failure. Given Definition |4| 
we may now state our main theorem: 

Theorem 5 Let S be a set of n regions. Suppose we have an O(logn) time location query struc- 
ture on S with construction cost 0(n) and a subset-biased structure on S with construction cost 
0{n' logn'). Then we can construct a statically optimal structure on S with construction cost 0{n). 



3 The transformation 

We now describe the construction for Theorem [5j By assumption, we are given a set S of n regions, 
and we have available an O(logn) time location query structure D on S with construction cost 0(n) 
as well as a subset-biased structure with construction cost O(n'logn'). 

^By convention, /(s) log(m//(s)) := if f{s) = 0. 
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3.1 Description of the structure 

Let a and /? be two constants such that 0</3<a<l — /3<1 (e.g., a = 1/2 and /3 = 1/3). The 
simple idea behind our transformation is as fohows: after every n" queries, we build a subset-biased 
structure for the most commonly accessed regions, in 0(n'^ log n) = o{n°^) time. We also keep a 
static O(logn) time structure as a backup for failed queries in the subset-biased structure. Formally, 
the structure has several parts: 

1. A static O(logn) query time structure. 

2. A structure that keeps track of how often each region was queried and that is capable of 
reporting the k most popular regions in 0{h) time. Since in each step we increment the count 
for a single region by 1, we can easily maintain such a structure in linear space and constant 
time per update. (The additional space overhead can be made sublinear at the expense of 
determinism thorough the use of a streaming algorithm for the so-called heavy hitters problem 
(e.g., fr2j). This shows that our transformation is also useful in a context where additional 
space is at a premium, for example for implicit data structures or when the data resides in 
read-only memory ^). 

3. A subset-biased structure D' that is built after 2n" queries and rebuilt every n"*'^ query 
thereafter. The structure contains the at most most popular regions at the time of the 
rebuilding that have been queried at least 2n" times. In the choice of these regions, we break 
ties arbitrarily. The weight of a region s, denoted w'{s), is the number of queries to s at the 
time of the rebuilding. More precisely, if the rebuilding is at time i, we set w'{s) := /t(s). 
Computing the most popular regions and the weight function w' takes time 0{n^) with 
the structure from Part[2| By assumption, the construction cost of D' is O(n'^logn). 

A search is executed on the subset-biased structure first. If it fails (at amortized cost O(logn)), it 
is executed in the static O(logn) time structure. 

3.2 Initial analysis of structure 

We will now analyze the properties of our structure. Our first lemma describes a key property 
of the rebuilding process: for any sufficiently popular region s, the amortized query time for s is 
proportional to the amortized query time a biased structure would achieve if it were weighted with 
the frequencies observed so far. 

Lemma 6 Consider the query pt at time t, and let s := q{pt) denote the resulting region. Suppose 
that ft{s) > 2n". Then the amortized cost for query pt is 

Proof: Since /t(s) > 2n", we have t > 2n". Thus, we first query the subset-biased structure D' . 
Suppose that D' has been rebuild last at time t' > t — n°'. There are two cases. 

Suppose first that s is contained in D' . Definition |4] ensures that the amortized time for the 
query in D' is 0{log{W' / ft'{s))), where W denotes the total number of queries for the regions in 
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D' at time t' . We have W < t (there have been t queries so far) and ft'{s) > ft{s) — (there have 
been at most queries since rebuilding). The lemma follows. 

Now suppose that s is not in D' . In this case, the query takes O(logn) amortized time in D' 
and O(logn) time in the static structure. We know that at time i', there were regions at least as 
popular as s. Thus, n^ft'{s) <t'<t. It follows that 



/3 log n = log < log 



< log 



ft'is) -"^^ ft{s)-n-' 

and the claimed bound suffices to account for the O(logn) query time. 

Using Lemma [61 we can now bound the running time in terms of the query frequencies. 



□ 



Lemma 7 Let S be a set of n regions. Our structure executes any query sequence P on S of length 
m in time 



I 



O 



( 



se5 



v 



/(«) 



min(/(s), 2n") logn log 



tj{s) 



V 



first 2n" queries to s 



j=2n°' 



/t,(s)(' 



+ 



m 



n 



a 



log n - 



n 



queries to s after the 2n°'th j 



•v static 
rebuild biased structure 
structure construction 



Proof: The main summation is over the regions in S. For each region s, the initial 2n" (or less) 
queries take time O(logn), since during these queries s is never in the subset-biased structure. The 
running times for the remaining queries (if any) are bounded using Lemma [6j The first additional 
term comes from the O(n'^logn) construction cost of the subset-biased structure, incurred every n" 
operations. The final term is the linear one-time cost to build the static structure. □ 



3.3 Technical Lemmas 

In order to simplify the bound in Lemma [7| we need two technical lemmas to deal with the various 
terms. The first lemma shows how to simplify the summation for the later queries. 

Lemma 8 Let S be a set of n regions, and let P be a query sequence on S of length m. For each 
region s G S, we have 

Proof: Since ftj{s){s) = j > 2n", we have ftj{s){s) — n" > j/2. Also, tj{s) < m. Thus, 

Here, we used Stirling's formula to bound /(s)! > {f{s)/e)-^^'^\ □ 
The second lemma deals with the time for the initial queries. 
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Lemma 9 Let -y be a constant with a < 7 < 1. If m > n'^ , then 

min(/(s), 2n") logn = O I /(s) log 



fis) 

Proof: Set 6 := {a + "f)/2. If f{s) < , the lemma holds since 

/(s)log(m//(s)) > f{s)log{m/n^) = n{f{s)logn) = J7(min(/(s), 2n") logn). 
If /(s) > n\ then 

/(s) log(m//(s)) > > In" logn, 
for n large enough, as desired (recall that we defined logx to be at least 1). □ 

3.4 Main theorem 

We can now prove our main theorem. 

Proof: [of Theorem [5] By Definition [3| we need to prove that the execution time is 



o(» + g/(.)log^) 



By Lemma [7| the running time is bounded by 

olj2 (min(/(g),2n") logn +^ log ] + [^J n^logn + 

We now apply Lemma [s] and note that n'^logn = o(m) to obtain a running time bound of 

O (min(/(s),2n")logn + /(s) {?, + logj^^ + n + mj. 
Since we defined log 2; to be at least 1, this simplifies to 

O (^min(/(s),2n°) logn + log + n^ . 

If m < n^~^ , the sum over s G 5" is at most n^~^logn = o(n). In this case, the bound simplifies 
to 0(n), and the theorem is proved. Otherwise, if m > n^^^, Lemma [9] applies with 7 := 1 — /3 
(a legal choice by our assumption on a and and the term min(/(s), 2n") logn collapses into 
/(s) log(m//(s)) to give the theorem. □ 
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4 Point location 



Theorem 10 There is a data structure for point location in a planar triangulation of size n that 
can execute any query sequence of length m in time 

Proof: It is easy to apply our general transformation to the problem of planar point location in a 
triangulation, as all of the required ingredients are well known. We assume that the triangulation is 



given in a standard representation, such as a doubly-connected edge list (e.g., 11, §2.2]). 

For the static structure with O(logn) query time and 0{n) construction time, Kirkpatrick's 
algorithm 34 can be used. For the subset-biased structure, the provided subset of n^^ triangles may 



not be a connected triangulation and thus needs to be triangulated; this takes time 0{n^ logn) using 
the classic line sweep approach ^38j. This creates 0{n^) new triangles, which are marked specially 
and given small weights. The resultant triangulation and weighting is given to a biased structure 
such as lacono's [3l]. The marking can be used to detect whether a query to the subset-biased 
structure was successful. With all ingredients in hand, the claim now follows from Theorem [5] □ 

Our choice of structures reflects a desire for the strongest asymptotic bounds possible. Thus, 
we have avoided structures that are randomized or that have non-linear construction cost; such 
structures, however, have far superior constants than the ones we use. If we took a data structure 
for the static O(logn) time queries with an O(nlogn) construction cost instead of 0(n), this would 
simply change the linear additive term in Theorem [TO] to nlogn. 



5 Point location in polygonal subdivisions with non-constant sized 
cells 

Our work applies to point location in triangulations. It can also be extended to polygonal subdivisions 
where each region has constant complexity. Indeed, suppose every region has k + 2 edges. We can just 
triangulate each region and then apply our result. As mentioned in the introduction, this operation 
could increase the entropy of the query outcomes. However, the log sum inequality [l9| Theorem 2.7.1] 
implies that X^i^i Pi log(l/Pj) ^ P^og{k/p) for any nonnegative pi,p2, ■ ■ ■ ,Pk and p = J2i=i Pk- Thus, 
if we subdivide a region with probability p into k triangles, the entropy increases by at most plogk. 
It follows that the overall entropy grows by at most log/c, which is acceptable if k is constant. 

Recently, several data structures have been developed for optimal point location where the 
distribution is known in advance for convex connected [Tt], connected [Is], a nd arbitrary polygonal 
|14| subdivisions of the plane, as well as the more general odds-on trees fl3|. Unfortunately, these 
structures are not biased according to our definition, since entropy-based lower bounds are not 
meaningful for them: a convex k-gon splits the plane into two regions, so the entropy of the query 
outcomes is constant. Nonetheless, some distributions require r2(logn) time for a point location 
query (in a reasonable model of computation that is more restrictive than the one described here). 

The entropy-sensitive structures for non-triangulations all basically work by triangulating the 
given subdivision as a function of the provided probability distribution, and then using one of 
the biased structures on the resultant triangulation. The main conceptual problem in using our 
framework with such a structure is that it is unclear how to triangulate during the rebuilding process, 
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since the optimal triangulation is not known in advance. One could imagine that triangulating 
during each rebuild based on the observed queries so far would work well, but proving this would 
require a more complex and specialized analysis than what has been presented in this paper. 
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